DTSA 5511 - Deep Learning - CNN Cancer Detection Mini-Project¶
Contents¶
- Setup and imports
- Introduction and overview of the problem
- Description of the data and EDA
- Modeling
- Results and analysis
1. Setup and imports¶
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
gpu_count = 0
if gpu_info.find('command not found') >= 0:
print('CUDA not installed')
elif gpu_info.find('failed') >= 0:
print('Not connected to a GPU')
else:
print(gpu_info)
gpu_count = !nvidia-smi -L | wc -l
print('GPU count:', gpu_count)
Sun Mar 10 18:32:17 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P0 25W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
GPU count: ['1']
from google.colab import userdata, files, drive
# drive.mount('/content/drive')
import os
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
import kaggle
%%time
# !mkdir /root/.kaggle
# !cp /content/drive/MyDrive/kaggle/kaggle.json /root/.kaggle/
# !chmod 600 /root/.kaggle/kaggle.json
#!kaggle competitions download -c histopathologic-cancer-detection
!unzip -q /content/drive/MyDrive/kaggle/histopathologic-cancer-detection/histopathologic-cancer-detection.zip
CPU times: user 580 ms, sys: 102 ms, total: 682 ms Wall time: 1min 53s
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
from PIL import Image
import os
from joblib import dump, load
import plotly.express as px
import plotly.graph_objects as go
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.optimizers.schedules import ExponentialDecay
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
2. Introduction and overview of the problem¶
In this mini-project, we will participate in the Kaggle competition named "Histopathologic Cancer Detection". The competition can be found at: https://www.kaggle.com/competitions/histopathologic-cancer-detection/overview.
The objective of this competition is to determine whether sample images contain cancerous cells. While Convolutional Neural Networks have made huge advances in image processing and classification over the last 5-10 years, detecting cancer in medical images is still an unsolved problem.
We will train a baseline CNN and report on its performance. Then we will try different variations on model architecture and learning rates to see if we can improve the performance. Finally, we will make predictions on the Kaggle competition's test set and submit it for scoring.
3. Description of the data and EDA¶
The training dataset consists of 220,025 images along with a binary label to indicate whether the image contains metastatic cancer cells. Each image is 96x96 pixels times 3 channels (red, green and blue). There are 130,908 negative cases and 89,117 positive cases in the training set, which represents a slight class imbalance, but not something that will require resampling. There are no missing labels and no duplicates.
Looking at a sample of images of positive and negative cases, reveals that these are indeed images of cells. To my untrained eye, there is no discernable difference between cancerous and non-cancerous cells. Most of the slides show cells with a purple tint, most likely from the staining process used to make the cell's internal structures visible. This purple cast is clearly visible in the RGB histogram, with the distribution of blue and red pixels clearly shifted to the right.
The test dataset consists of 57,458 unlabeled images of the same size as those in the training set.
labels_df = pd.read_csv('/content/train_labels.csv')
labels_df['label_bool'] = labels_df['label'].astype(bool)
labels_df['label_bin'] = labels_df['label'].astype(str)
labels_df['file_name'] = labels_df['id'] + '.tif'
labels_df
| id | label | label_bool | label_bin | file_name | |
|---|---|---|---|---|---|
| 0 | f38a6374c348f90b587e046aac6079959adf3835 | 0 | False | 0 | f38a6374c348f90b587e046aac6079959adf3835.tif |
| 1 | c18f2d887b7ae4f6742ee445113fa1aef383ed77 | 1 | True | 1 | c18f2d887b7ae4f6742ee445113fa1aef383ed77.tif |
| 2 | 755db6279dae599ebb4d39a9123cce439965282d | 0 | False | 0 | 755db6279dae599ebb4d39a9123cce439965282d.tif |
| 3 | bc3f0c64fb968ff4a8bd33af6971ecae77c75e08 | 0 | False | 0 | bc3f0c64fb968ff4a8bd33af6971ecae77c75e08.tif |
| 4 | 068aba587a4950175d04c680d38943fd488d6a9d | 0 | False | 0 | 068aba587a4950175d04c680d38943fd488d6a9d.tif |
| ... | ... | ... | ... | ... | ... |
| 220020 | 53e9aa9d46e720bf3c6a7528d1fca3ba6e2e49f6 | 0 | False | 0 | 53e9aa9d46e720bf3c6a7528d1fca3ba6e2e49f6.tif |
| 220021 | d4b854fe38b07fe2831ad73892b3cec877689576 | 1 | True | 1 | d4b854fe38b07fe2831ad73892b3cec877689576.tif |
| 220022 | 3d046cead1a2a5cbe00b2b4847cfb7ba7cf5fe75 | 0 | False | 0 | 3d046cead1a2a5cbe00b2b4847cfb7ba7cf5fe75.tif |
| 220023 | f129691c13433f66e1e0671ff1fe80944816f5a2 | 0 | False | 0 | f129691c13433f66e1e0671ff1fe80944816f5a2.tif |
| 220024 | a81f84895ddcd522302ddf34be02eb1b3e5af1cb | 1 | True | 1 | a81f84895ddcd522302ddf34be02eb1b3e5af1cb.tif |
220025 rows × 5 columns
labels_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 220025 entries, 0 to 220024 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 220025 non-null object 1 label 220025 non-null int64 2 label_bool 220025 non-null bool 3 label_bin 220025 non-null object 4 file_name 220025 non-null object dtypes: bool(1), int64(1), object(3) memory usage: 6.9+ MB
labels_df['label'].value_counts()
0 130908 1 89117 Name: label, dtype: int64
fig = px.histogram(labels_df, x='label_bool', histnorm='probability density')
fig.show()
labels_df[labels_df['label']==True].sample(5)['id']
53363 1129c7582bc6462a4fa8018b8d7b6bb2ac7763a9 20063 15d75b174c7e6a0d49fea7a7c471caeaa75d4ef0 148111 0f0cc8f02598b4d3ac1d0b2e5875e4d3552c0fe6 164603 a160de3dc02ae5c87bc70684685459b52698e7f2 197040 1ea6f119bd3b2996e64e74d0d2bf40688cc26d37 Name: id, dtype: object
# Look at some examples with tumors
fig = plt.figure(figsize=(12, 9))
for id, name in enumerate(labels_df[labels_df['label']==True].sample(5)['id']):
ax = fig.add_subplot(1, 5, id+1)
im = Image.open('/content/train/' + name + '.tif')
plt.imshow(im)
# And some without
fig = plt.figure(figsize=(12, 9))
for id, name in enumerate(labels_df[labels_df['label']==False].sample(5)['id']):
ax = fig.add_subplot(1, 5, id+1)
im = Image.open('/content/train/' + name + '.tif')
plt.imshow(im)
img_file_name = labels_df[labels_df['label']==True].sample(1)['id'].iloc[0] + '.tif'
img = np.array(Image.open('/content/train/' + img_file_name))
img.shape
(96, 96, 3)
from plotly.subplots import make_subplots
fig = make_subplots(1, 2)
fig.add_trace(go.Image(z=img), 1, 1)
for channel, color in enumerate(['red', 'green', 'blue']):
fig.add_trace(go.Histogram(x=img[..., channel].ravel(), opacity=0.5,
marker_color=color, name='%s channel' %color), 1, 2)
fig.update_layout(height=400)
fig.show()
test_df = pd.DataFrame(os.listdir('/content/test/'), columns=['file_name'])
test_df
| file_name | |
|---|---|
| 0 | 1c9de83a0cb3e8918884719a158fc4cad3f9d1af.tif |
| 1 | b383c963d3236b55a941f9a9503d198ff5491116.tif |
| 2 | 35b99f7e8df4882ade0ff57aa0f2ae511911b371.tif |
| 3 | f78b7600773617b56ec78af1cea5e827422a2a80.tif |
| 4 | 9c2041bad259eecdf62dcd24c9a75c14a37b6363.tif |
| ... | ... |
| 57453 | 680382b1e26f22d8b36f5b809b04d6920bc78607.tif |
| 57454 | 9ee5a93349fb649335787585ed5f77d6a8185054.tif |
| 57455 | a3a4e7a165fdb629fa995317e3393372f19ad267.tif |
| 57456 | 35a2ab6b18fd10d3125144146711e62c79edc52c.tif |
| 57457 | f2614e68667e9c980b5f9ee61b09d19c96aed067.tif |
57458 rows × 1 columns
test_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 57458 entries, 0 to 57457 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 file_name 57458 non-null object dtypes: object(1) memory usage: 449.0+ KB
4. Modeling¶
In order to make predictions on the test images, we will train a convolutional neural network using Tensorflow and Keras. We will start by creating ImageDataGenerator objects for the training, validation and test data sets. We will then train a small CNN model to make sure the pipelines work and to establish a baseline. From there we will iteratively try to improve the predictions by tweaking the model architecture and hyperparameters.
Iteration 1:
- The basic architecture is two repetitions of a convolutional layer followed by a max pooling layer for the feature extraction section. The classification section consists of one fully connected layer with 32 units. For this baseline model, I wanted something with a low parameter count to keep the training time down. This was accomplished by using 2x2 strides.
- At the end of 10 epochs, the training and validation scores are still improving, indicating that we are underfitting and a more complex model is justified.
Iteration 2:
- In order to increase model complexity, I added a third repetition of a convolutional layer plus max pooling layer. All other factors unchanged.
- Not a huge improvement over Iteration 1.
Iteration 3:
- Changed strides from 2x2 to 1x1. This resulted in a 10x increase in the number of parameters in the model (from 100k to 1M).
- Slight increase in performance, but no signs of overfitting yet, so we will continue to increase model complexity.
Iteration 4:
- Added a convolutional layer to each of the three repetitions in the feature extraction section, plus an additional fully connected layer in the classifier. Parameter count increases to 1.3M.
- Significant increase in accuracy score from 0.87 to 0.94.
Iteration 5:
- Went back to one Conv layer for each of the three iterations, but increased the number of filters from (32, 64, 128) to (64, 128, 256). Number of parameters explodes to 3.7M.
- Validation accuracy score drops to 0.91 and peaks after 4 epochs, indicating possible overfitting and that 3.7M parameters may be too many.
Iteration 6:
- Went back to the architecture from Iteration 4, but with 4 layers in the classification section. Exploring the idea that the feature extraction section is performing well, but that we could benefit from more complexity in the classification part.
- No significant improvement over Iteration 4. Since this model has significantly more parameters than Iteration 4 without any improvement in evaluation metric, we will stick with Iteration 4 as the best architecture so far.
Iteration 7:
- Go back to Iteration 4 architecture, but change learning rate to 0.01 (from 0.001).
- Accuracy flatlines around 0.6. This model is not learning anything.
Iteration 8:
- Since higher learning rate destroys the model's ability to learn, let's try going to other way. Implemented a learning rate scheduler, using exponential decay.
- Accuracy score improved to 0.94, with room to grow.
Iteration 9:
- Since we don't see signs of overfitting, let's try training for more epochs: 20 instead of 10. If validation accuracy score plateau and start to drop, we know we've gone too far. Fixed learning rate at 0.001.
- Accuracy score peaks at 0.94 around epoch 14, but doesn't drop off significantly.
Iteration 10:
- Combine ideas from iterations 8 and 9. Train to 20 epochs, but use exponential decay learning rate scheduler to try to narrow down on that local minimum without overshooting.
- Results: best yet accuracy of 0.943 in epoch 18, without significant dropoff in later epochs.
generator = ImageDataGenerator(rescale=1./255, validation_split=0.2)
%%time
batch_size = 256
trn_data_gen = generator.flow_from_dataframe(dataframe=labels_df,
target_size=(96,96),
x_col='file_name',
y_col='label_bin',
directory='/content/train/',
subset='training',
batch_size=batch_size,
class_mode='binary',
seed=42)
val_data_gen = generator.flow_from_dataframe(dataframe=labels_df,
target_size=(96,96),
x_col='file_name',
y_col='label_bin',
directory='/content/train/',
subset='validation',
batch_size=batch_size,
class_mode='binary',
seed=42)
tst_data_gen = generator.flow_from_dataframe(dataframe=test_df,
target_size=(96,96),
x_col='file_name',
y_col=None,
directory='/content/test/',
subset=None,
batch_size=batch_size,
class_mode=None,
seed=42,
shuffle=False)
Found 176020 validated image filenames belonging to 2 classes. Found 44005 validated image filenames belonging to 2 classes. Found 57458 validated image filenames. CPU times: user 2.17 s, sys: 1.15 s, total: 3.31 s Wall time: 3.31 s
# Iteration 1: Small model
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (2,2)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(32, activation='relu', name='Output_dense_1'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_1')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1 (Conv2D) (None, 47, 47, 32) 896
Max_Pool_1 (MaxPooling2D) (None, 23, 23, 32) 0
Convolution_2 (Conv2D) (None, 11, 11, 64) 18496
Max_Pool_2 (MaxPooling2D) (None, 5, 5, 64) 0
Flatten_for_output (Flatte (None, 1600) 0
n)
Output_dropout (Dropout) (None, 1600) 0
Output_dense_1 (Dense) (None, 32) 51232
Classifier (Dense) (None, 1) 33
=================================================================
Total params: 70657 (276.00 KB)
Trainable params: 70657 (276.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
if gpu_count > 0:
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
dump(history, '/content/drive/MyDrive/ML3_hist/history_01')
else:
history = load('/content/drive/MyDrive/ML3_hist/history_01')
history.history['val_accuracy']
[0.8134757280349731, 0.8259743452072144, 0.846494734287262, 0.8544028997421265, 0.8599931597709656, 0.8663333654403687, 0.8682194948196411, 0.8591296672821045, 0.8760595321655273, 0.8780593276023865]
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
plt.show()
# Iteration 2: Add a third convolution layer
# Arch: 2x2 strides, 3 conv layers, 1 fully connected layer with 128 units
# Compile options: Adam, BCE
strides = (2,2)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2'),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_2')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1 (Conv2D) (None, 47, 47, 32) 896
Max_Pool_1 (MaxPooling2D) (None, 23, 23, 32) 0
Convolution_2 (Conv2D) (None, 11, 11, 64) 18496
Max_Pool_2 (MaxPooling2D) (None, 5, 5, 64) 0
Convolution_3 (Conv2D) (None, 2, 2, 128) 73856
Max_Pool_3 (MaxPooling2D) (None, 1, 1, 128) 0
Flatten_for_output (Flatte (None, 128) 0
n)
Output_dropout (Dropout) (None, 128) 0
Output_dense_1 (Dense) (None, 128) 16512
Classifier (Dense) (None, 1) 129
=================================================================
Total params: 109889 (429.25 KB)
Trainable params: 109889 (429.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
if gpu_count > 0:
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
dump(history, '/content/drive/MyDrive/ML3_hist/history_02')
else:
history = load('/content/drive/MyDrive/ML3_hist/history_02')
history.history['val_accuracy']
[0.804226815700531, 0.8438131809234619, 0.8574934601783752, 0.8551982641220093, 0.8660833835601807, 0.8711510300636292, 0.8745824098587036, 0.8793091773986816, 0.87094646692276, 0.8765822052955627]
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
plt.show()
# Iteration 3: Same as 2 but with strides = 1x1
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1 (Conv2D) (None, 94, 94, 32) 896
Max_Pool_1 (MaxPooling2D) (None, 47, 47, 32) 0
Convolution_2 (Conv2D) (None, 45, 45, 64) 18496
Max_Pool_2 (MaxPooling2D) (None, 22, 22, 64) 0
Flatten_for_output (Flatte (None, 30976) 0
n)
Output_dropout (Dropout) (None, 30976) 0
Output_dense_1 (Dense) (None, 32) 991264
Classifier (Dense) (None, 1) 33
=================================================================
Total params: 1010689 (3.86 MB)
Trainable params: 1010689 (3.86 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
if gpu_count > 0:
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
dump(history, '/content/drive/MyDrive/ML3_hist/history_03')
else:
history = load('/content/drive/MyDrive/ML3_hist/history_03')
Epoch 1/10 688/688 [==============================] - 144s 206ms/step - loss: 0.5216 - accuracy: 0.7659 - val_loss: 0.4857 - val_accuracy: 0.7964 Epoch 2/10 688/688 [==============================] - 140s 203ms/step - loss: 0.4573 - accuracy: 0.8084 - val_loss: 0.4279 - val_accuracy: 0.8252 Epoch 3/10 688/688 [==============================] - 139s 203ms/step - loss: 0.4127 - accuracy: 0.8286 - val_loss: 0.3834 - val_accuracy: 0.8445 Epoch 4/10 688/688 [==============================] - 139s 202ms/step - loss: 0.3791 - accuracy: 0.8423 - val_loss: 0.3694 - val_accuracy: 0.8456 Epoch 5/10 688/688 [==============================] - 139s 202ms/step - loss: 0.3548 - accuracy: 0.8531 - val_loss: 0.3500 - val_accuracy: 0.8495 Epoch 6/10 688/688 [==============================] - 140s 203ms/step - loss: 0.3347 - accuracy: 0.8607 - val_loss: 0.3351 - val_accuracy: 0.8589 Epoch 7/10 688/688 [==============================] - 142s 206ms/step - loss: 0.3180 - accuracy: 0.8676 - val_loss: 0.3222 - val_accuracy: 0.8631 Epoch 8/10 688/688 [==============================] - 138s 200ms/step - loss: 0.2990 - accuracy: 0.8748 - val_loss: 0.2878 - val_accuracy: 0.8795 Epoch 9/10 688/688 [==============================] - 140s 204ms/step - loss: 0.2880 - accuracy: 0.8796 - val_loss: 0.2879 - val_accuracy: 0.8810 Epoch 10/10 688/688 [==============================] - 144s 209ms/step - loss: 0.2821 - accuracy: 0.8822 - val_loss: 0.2904 - val_accuracy: 0.8782
history = load('/content/drive/MyDrive/ML3_hist/history_03')
history.history['val_accuracy']
[0.7963867783546448, 0.8252471089363098, 0.8445404171943665, 0.8456084728240967, 0.8494943976402283, 0.8588569760322571, 0.8630610108375549, 0.8794682621955872, 0.8809908032417297, 0.8782411217689514]
dump(history, '/content/drive/MyDrive/ML3_hist/history_03')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
plt.show()
<matplotlib.legend.Legend at 0x7c7e7433d900>
# Iteration 4: Adding more an additional Conv layer to each group
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1_1 (Conv2D) (None, 94, 94, 32) 896
Convolution_1_2 (Conv2D) (None, 92, 92, 32) 9248
Max_Pool_1 (MaxPooling2D) (None, 46, 46, 32) 0
Convolution_2_1 (Conv2D) (None, 44, 44, 64) 18496
Convolution_2_2 (Conv2D) (None, 42, 42, 64) 36928
Max_Pool_2 (MaxPooling2D) (None, 21, 21, 64) 0
Convolution_3_1 (Conv2D) (None, 19, 19, 128) 73856
Convolution_3_2 (Conv2D) (None, 17, 17, 128) 147584
Max_Pool_3 (MaxPooling2D) (None, 8, 8, 128) 0
Flatten_for_output (Flatte (None, 8192) 0
n)
Output_dropout (Dropout) (None, 8192) 0
Output_dense_1 (Dense) (None, 128) 1048704
Output_dense_2 (Dense) (None, 64) 8256
Classifier (Dense) (None, 1) 65
=================================================================
Total params: 1344033 (5.13 MB)
Trainable params: 1344033 (5.13 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
if gpu_count > 0:
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
dump(history, '/content/drive/MyDrive/ML3_hist/history_04')
else:
history = load('/content/drive/MyDrive/ML3_hist/history_04')
Epoch 1/10 688/688 [==============================] - 147s 207ms/step - loss: 0.4400 - accuracy: 0.7971 - val_loss: 0.3505 - val_accuracy: 0.8465 Epoch 2/10 688/688 [==============================] - 140s 204ms/step - loss: 0.3100 - accuracy: 0.8687 - val_loss: 0.2737 - val_accuracy: 0.8866 Epoch 3/10 688/688 [==============================] - 138s 200ms/step - loss: 0.2573 - accuracy: 0.8954 - val_loss: 0.2375 - val_accuracy: 0.9042 Epoch 4/10 688/688 [==============================] - 140s 203ms/step - loss: 0.2253 - accuracy: 0.9114 - val_loss: 0.2143 - val_accuracy: 0.9178 Epoch 5/10 688/688 [==============================] - 144s 210ms/step - loss: 0.2073 - accuracy: 0.9188 - val_loss: 0.1955 - val_accuracy: 0.9245 Epoch 6/10 688/688 [==============================] - 143s 207ms/step - loss: 0.1926 - accuracy: 0.9255 - val_loss: 0.1912 - val_accuracy: 0.9257 Epoch 7/10 688/688 [==============================] - 139s 201ms/step - loss: 0.1740 - accuracy: 0.9342 - val_loss: 0.1902 - val_accuracy: 0.9259 Epoch 8/10 688/688 [==============================] - 140s 203ms/step - loss: 0.1643 - accuracy: 0.9381 - val_loss: 0.1731 - val_accuracy: 0.9351 Epoch 9/10 688/688 [==============================] - 138s 200ms/step - loss: 0.1544 - accuracy: 0.9421 - val_loss: 0.1680 - val_accuracy: 0.9383 Epoch 10/10 688/688 [==============================] - 140s 203ms/step - loss: 0.1418 - accuracy: 0.9465 - val_loss: 0.1709 - val_accuracy: 0.9371
history = load('/content/drive/MyDrive/ML3_hist/history_04')
history.history['val_accuracy']
[0.8465401530265808, 0.8865583539009094, 0.9041926860809326, 0.9178275465965271, 0.9245312809944153, 0.9256675243377686, 0.9259175062179565, 0.9350528120994568, 0.9383479356765747, 0.9370753169059753]
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
plt.show()
<matplotlib.legend.Legend at 0x7c7e3f197c70>
# Iteration 5: Back to 1 Conv layer per group but increase number of filters.
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(256, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_5')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1_1 (Conv2D) (None, 94, 94, 64) 1792
Max_Pool_1 (MaxPooling2D) (None, 47, 47, 64) 0
Convolution_2_1 (Conv2D) (None, 45, 45, 128) 73856
Max_Pool_2 (MaxPooling2D) (None, 22, 22, 128) 0
Convolution_3_1 (Conv2D) (None, 20, 20, 256) 295168
Max_Pool_3 (MaxPooling2D) (None, 10, 10, 256) 0
Flatten_for_output (Flatte (None, 25600) 0
n)
Output_dropout (Dropout) (None, 25600) 0
Output_dense_1 (Dense) (None, 128) 3276928
Output_dense_2 (Dense) (None, 64) 8256
Classifier (Dense) (None, 1) 65
=================================================================
Total params: 3656065 (13.95 MB)
Trainable params: 3656065 (13.95 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
Epoch 1/10 688/688 [==============================] - 143s 202ms/step - loss: 0.4340 - accuracy: 0.8002 - val_loss: 0.3647 - val_accuracy: 0.8443 Epoch 2/10 688/688 [==============================] - 138s 200ms/step - loss: 0.3470 - accuracy: 0.8507 - val_loss: 0.3465 - val_accuracy: 0.8487 Epoch 3/10 688/688 [==============================] - 140s 203ms/step - loss: 0.3054 - accuracy: 0.8712 - val_loss: 0.2804 - val_accuracy: 0.8844 Epoch 4/10 688/688 [==============================] - 137s 199ms/step - loss: 0.2745 - accuracy: 0.8867 - val_loss: 0.2588 - val_accuracy: 0.8926 Epoch 5/10 688/688 [==============================] - 140s 203ms/step - loss: 0.2513 - accuracy: 0.8976 - val_loss: 0.2348 - val_accuracy: 0.9056 Epoch 6/10 688/688 [==============================] - 140s 204ms/step - loss: 0.2352 - accuracy: 0.9049 - val_loss: 0.2558 - val_accuracy: 0.8949 Epoch 7/10 688/688 [==============================] - 140s 204ms/step - loss: 0.2180 - accuracy: 0.9122 - val_loss: 0.2380 - val_accuracy: 0.9054 Epoch 8/10 688/688 [==============================] - 140s 204ms/step - loss: 0.2038 - accuracy: 0.9189 - val_loss: 0.2149 - val_accuracy: 0.9147 Epoch 9/10 688/688 [==============================] - 141s 204ms/step - loss: 0.1892 - accuracy: 0.9252 - val_loss: 0.2408 - val_accuracy: 0.9043 Epoch 10/10 688/688 [==============================] - 141s 205ms/step - loss: 0.1765 - accuracy: 0.9300 - val_loss: 0.2233 - val_accuracy: 0.9112
history = load('/content/drive/MyDrive/ML3_hist/history_05')
history.history['val_accuracy']
[0.8442904353141785, 0.8486762642860413, 0.8843767642974854, 0.8925803899765015, 0.9056016206741333, 0.8948982954025269, 0.9054198265075684, 0.9146687984466553, 0.9042835831642151, 0.9112146496772766]
dump(history, '/content/drive/MyDrive/ML3_hist/history_5')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3ee5b0a0>
# Iteration 6: Back to 2 Conv layers per group, more FC layers at the end
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(256, activation='relu', name='Output_dense_1'),
Dense(128, activation='relu', name='Output_dense_2'),
Dense(64, activation='relu', name='Output_dense_3'),
Dense(32, activation='relu', name='Output_dense_4'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1_1 (Conv2D) (None, 94, 94, 32) 896
Convolution_1_2 (Conv2D) (None, 92, 92, 32) 9248
Max_Pool_1 (MaxPooling2D) (None, 46, 46, 32) 0
Convolution_2_1 (Conv2D) (None, 44, 44, 64) 18496
Convolution_2_2 (Conv2D) (None, 42, 42, 64) 36928
Max_Pool_2 (MaxPooling2D) (None, 21, 21, 64) 0
Convolution_3_1 (Conv2D) (None, 19, 19, 128) 73856
Convolution_3_2 (Conv2D) (None, 17, 17, 128) 147584
Max_Pool_3 (MaxPooling2D) (None, 8, 8, 128) 0
Flatten_for_output (Flatte (None, 8192) 0
n)
Output_dropout (Dropout) (None, 8192) 0
Output_dense_1 (Dense) (None, 256) 2097408
Output_dense_2 (Dense) (None, 128) 32896
Output_dense_3 (Dense) (None, 64) 8256
Output_dense_4 (Dense) (None, 32) 2080
Classifier (Dense) (None, 1) 33
=================================================================
Total params: 2427681 (9.26 MB)
Trainable params: 2427681 (9.26 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
Epoch 1/10 688/688 [==============================] - 144s 205ms/step - loss: 0.4480 - accuracy: 0.7933 - val_loss: 0.3817 - val_accuracy: 0.8290 Epoch 2/10 688/688 [==============================] - 142s 207ms/step - loss: 0.3350 - accuracy: 0.8567 - val_loss: 0.3103 - val_accuracy: 0.8683 Epoch 3/10 688/688 [==============================] - 140s 204ms/step - loss: 0.2783 - accuracy: 0.8850 - val_loss: 0.2555 - val_accuracy: 0.8989 Epoch 4/10 688/688 [==============================] - 142s 206ms/step - loss: 0.2491 - accuracy: 0.8993 - val_loss: 0.2280 - val_accuracy: 0.9104 Epoch 5/10 688/688 [==============================] - 141s 205ms/step - loss: 0.2213 - accuracy: 0.9121 - val_loss: 0.2100 - val_accuracy: 0.9202 Epoch 6/10 688/688 [==============================] - 138s 200ms/step - loss: 0.1992 - accuracy: 0.9229 - val_loss: 0.2165 - val_accuracy: 0.9152 Epoch 7/10 688/688 [==============================] - 140s 203ms/step - loss: 0.1898 - accuracy: 0.9271 - val_loss: 0.2209 - val_accuracy: 0.9182 Epoch 8/10 688/688 [==============================] - 140s 204ms/step - loss: 0.1753 - accuracy: 0.9331 - val_loss: 0.1781 - val_accuracy: 0.9340 Epoch 9/10 688/688 [==============================] - 140s 203ms/step - loss: 0.1623 - accuracy: 0.9383 - val_loss: 0.1851 - val_accuracy: 0.9291 Epoch 10/10 688/688 [==============================] - 141s 204ms/step - loss: 0.1511 - accuracy: 0.9432 - val_loss: 0.1728 - val_accuracy: 0.9367
history = load('/content/drive/MyDrive/ML3_hist/history_06')
history.history['val_accuracy']
[0.8289967179298401, 0.8683103919029236, 0.8988751173019409, 0.9103510975837708, 0.920236349105835, 0.9151914715766907, 0.9182365536689758, 0.9340302348136902, 0.9291216731071472, 0.9366663098335266]
dump(history, '/content/drive/MyDrive/ML3_hist/history_6')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3e9a24a0>
# Iteration 7: Change the learning rate=0.01, architecture baseline is #4 (lr=0.001)
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
model.compile(optimizer=Adam(learning_rate=0.01),
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1_1 (Conv2D) (None, 94, 94, 32) 896
Convolution_1_2 (Conv2D) (None, 92, 92, 32) 9248
Max_Pool_1 (MaxPooling2D) (None, 46, 46, 32) 0
Convolution_2_1 (Conv2D) (None, 44, 44, 64) 18496
Convolution_2_2 (Conv2D) (None, 42, 42, 64) 36928
Max_Pool_2 (MaxPooling2D) (None, 21, 21, 64) 0
Convolution_3_1 (Conv2D) (None, 19, 19, 128) 73856
Convolution_3_2 (Conv2D) (None, 17, 17, 128) 147584
Max_Pool_3 (MaxPooling2D) (None, 8, 8, 128) 0
Flatten_for_output (Flatte (None, 8192) 0
n)
Output_dropout (Dropout) (None, 8192) 0
Output_dense_1 (Dense) (None, 128) 1048704
Output_dense_2 (Dense) (None, 64) 8256
Classifier (Dense) (None, 1) 65
=================================================================
Total params: 1344033 (5.13 MB)
Trainable params: 1344033 (5.13 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
Epoch 1/10 688/688 [==============================] - 144s 206ms/step - loss: 0.7175 - accuracy: 0.5931 - val_loss: 0.6746 - val_accuracy: 0.5961 Epoch 2/10 688/688 [==============================] - 139s 203ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6747 - val_accuracy: 0.5961 Epoch 3/10 688/688 [==============================] - 142s 207ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6746 - val_accuracy: 0.5961 Epoch 4/10 688/688 [==============================] - 140s 203ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6746 - val_accuracy: 0.5961 Epoch 5/10 688/688 [==============================] - 143s 208ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6746 - val_accuracy: 0.5961 Epoch 6/10 688/688 [==============================] - 139s 203ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6747 - val_accuracy: 0.5961 Epoch 7/10 688/688 [==============================] - 142s 206ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6745 - val_accuracy: 0.5961 Epoch 8/10 688/688 [==============================] - 144s 209ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6745 - val_accuracy: 0.5961 Epoch 9/10 688/688 [==============================] - 143s 207ms/step - loss: 0.6751 - accuracy: 0.5947 - val_loss: 0.6746 - val_accuracy: 0.5961 Epoch 10/10 688/688 [==============================] - 145s 210ms/step - loss: 0.6752 - accuracy: 0.5947 - val_loss: 0.6747 - val_accuracy: 0.5961
history = load('/content/drive/MyDrive/ML3_hist/history_07')
history.history['val_accuracy']
[0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857, 0.5961368083953857]
dump(history, '/content/drive/MyDrive/ML3_hist/history_7')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3ed25c00>
# Iteration 8: Exponential decay LR, architecture baseline is #4 (lr=0.001)
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
lr_schedule = ExponentialDecay(initial_learning_rate=0.001,
decay_steps=100000,
decay_rate=0.96,
staircase=True)
adam_optimizer = Adam(learning_rate=lr_schedule)
model.compile(optimizer=adam_optimizer,
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(trn_data_gen, epochs=10, validation_data=val_data_gen)
Epoch 1/10 688/688 [==============================] - 142s 203ms/step - loss: 0.4352 - accuracy: 0.8013 - val_loss: 0.3807 - val_accuracy: 0.8261 Epoch 2/10 688/688 [==============================] - 140s 203ms/step - loss: 0.3248 - accuracy: 0.8617 - val_loss: 0.2727 - val_accuracy: 0.8905 Epoch 3/10 688/688 [==============================] - 139s 202ms/step - loss: 0.2644 - accuracy: 0.8920 - val_loss: 0.2801 - val_accuracy: 0.8851 Epoch 4/10 688/688 [==============================] - 141s 205ms/step - loss: 0.2299 - accuracy: 0.9085 - val_loss: 0.2460 - val_accuracy: 0.9011 Epoch 5/10 688/688 [==============================] - 142s 207ms/step - loss: 0.2117 - accuracy: 0.9169 - val_loss: 0.2500 - val_accuracy: 0.8994 Epoch 6/10 688/688 [==============================] - 139s 202ms/step - loss: 0.1916 - accuracy: 0.9260 - val_loss: 0.2094 - val_accuracy: 0.9180 Epoch 7/10 688/688 [==============================] - 138s 201ms/step - loss: 0.1790 - accuracy: 0.9317 - val_loss: 0.1978 - val_accuracy: 0.9251 Epoch 8/10 688/688 [==============================] - 140s 203ms/step - loss: 0.1683 - accuracy: 0.9358 - val_loss: 0.1644 - val_accuracy: 0.9386 Epoch 9/10 688/688 [==============================] - 142s 206ms/step - loss: 0.1595 - accuracy: 0.9399 - val_loss: 0.1653 - val_accuracy: 0.9389 Epoch 10/10 688/688 [==============================] - 141s 205ms/step - loss: 0.1501 - accuracy: 0.9436 - val_loss: 0.1740 - val_accuracy: 0.9347
history = load('/content/drive/MyDrive/ML3_hist/history_08')
history.history['val_accuracy']
[0.8260879516601562, 0.8905124664306641, 0.8850585222244263, 0.9011476039886475, 0.8993977904319763, 0.9179865717887878, 0.9250994324684143, 0.9385524392127991, 0.9388933181762695, 0.9346665143966675]
dump(history, '/content/drive/MyDrive/ML3_hist/history_8')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3edd3250>
# Iteration 9: Baseline arch from #4, but train to 20 epochs
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
Model: "Model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Convolution_1_1 (Conv2D) (None, 94, 94, 32) 896
Convolution_1_2 (Conv2D) (None, 92, 92, 32) 9248
Max_Pool_1 (MaxPooling2D) (None, 46, 46, 32) 0
Convolution_2_1 (Conv2D) (None, 44, 44, 64) 18496
Convolution_2_2 (Conv2D) (None, 42, 42, 64) 36928
Max_Pool_2 (MaxPooling2D) (None, 21, 21, 64) 0
Convolution_3_1 (Conv2D) (None, 19, 19, 128) 73856
Convolution_3_2 (Conv2D) (None, 17, 17, 128) 147584
Max_Pool_3 (MaxPooling2D) (None, 8, 8, 128) 0
Flatten_for_output (Flatte (None, 8192) 0
n)
Output_dropout (Dropout) (None, 8192) 0
Output_dense_1 (Dense) (None, 128) 1048704
Output_dense_2 (Dense) (None, 64) 8256
Classifier (Dense) (None, 1) 65
=================================================================
Total params: 1344033 (5.13 MB)
Trainable params: 1344033 (5.13 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
history = model.fit(trn_data_gen, epochs=20, validation_data=val_data_gen)
Epoch 1/20 688/688 [==============================] - 141s 201ms/step - loss: 0.4417 - accuracy: 0.7974 - val_loss: 0.3597 - val_accuracy: 0.8449 Epoch 2/20 688/688 [==============================] - 138s 201ms/step - loss: 0.3362 - accuracy: 0.8555 - val_loss: 0.2817 - val_accuracy: 0.8846 Epoch 3/20 688/688 [==============================] - 140s 203ms/step - loss: 0.2730 - accuracy: 0.8872 - val_loss: 0.2398 - val_accuracy: 0.9038 Epoch 4/20 688/688 [==============================] - 139s 202ms/step - loss: 0.2380 - accuracy: 0.9045 - val_loss: 0.2157 - val_accuracy: 0.9155 Epoch 5/20 688/688 [==============================] - 138s 200ms/step - loss: 0.2112 - accuracy: 0.9170 - val_loss: 0.1990 - val_accuracy: 0.9220 Epoch 6/20 688/688 [==============================] - 137s 200ms/step - loss: 0.1959 - accuracy: 0.9234 - val_loss: 0.1973 - val_accuracy: 0.9234 Epoch 7/20 688/688 [==============================] - 137s 200ms/step - loss: 0.1828 - accuracy: 0.9297 - val_loss: 0.1782 - val_accuracy: 0.9334 Epoch 8/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1678 - accuracy: 0.9359 - val_loss: 0.1792 - val_accuracy: 0.9317 Epoch 9/20 688/688 [==============================] - 138s 201ms/step - loss: 0.1578 - accuracy: 0.9401 - val_loss: 0.1790 - val_accuracy: 0.9339 Epoch 10/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1489 - accuracy: 0.9438 - val_loss: 0.1843 - val_accuracy: 0.9292 Epoch 11/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1378 - accuracy: 0.9484 - val_loss: 0.1721 - val_accuracy: 0.9395 Epoch 12/20 688/688 [==============================] - 140s 204ms/step - loss: 0.1268 - accuracy: 0.9528 - val_loss: 0.1746 - val_accuracy: 0.9361 Epoch 13/20 688/688 [==============================] - 141s 205ms/step - loss: 0.1215 - accuracy: 0.9552 - val_loss: 0.1697 - val_accuracy: 0.9408 Epoch 14/20 688/688 [==============================] - 138s 200ms/step - loss: 0.1130 - accuracy: 0.9580 - val_loss: 0.1654 - val_accuracy: 0.9428 Epoch 15/20 688/688 [==============================] - 137s 199ms/step - loss: 0.1045 - accuracy: 0.9614 - val_loss: 0.1785 - val_accuracy: 0.9385 Epoch 16/20 688/688 [==============================] - 138s 201ms/step - loss: 0.0989 - accuracy: 0.9631 - val_loss: 0.1853 - val_accuracy: 0.9386 Epoch 17/20 688/688 [==============================] - 141s 205ms/step - loss: 0.0923 - accuracy: 0.9659 - val_loss: 0.1899 - val_accuracy: 0.9399 Epoch 18/20 688/688 [==============================] - 139s 202ms/step - loss: 0.0882 - accuracy: 0.9671 - val_loss: 0.1951 - val_accuracy: 0.9350 Epoch 19/20 688/688 [==============================] - 139s 202ms/step - loss: 0.0825 - accuracy: 0.9693 - val_loss: 0.2239 - val_accuracy: 0.9288 Epoch 20/20 688/688 [==============================] - 138s 200ms/step - loss: 0.0797 - accuracy: 0.9703 - val_loss: 0.1972 - val_accuracy: 0.9371
history = load('/content/drive/MyDrive/ML3_hist/history_09')
history.history['val_accuracy']
[0.8448585271835327, 0.8846040368080139, 0.9037836790084839, 0.9155095815658569, 0.922031581401825, 0.9234178066253662, 0.9333939552307129, 0.9317123293876648, 0.9339393377304077, 0.9291898608207703, 0.9395068883895874, 0.9360981583595276, 0.940779447555542, 0.9427792429924011, 0.9385069608688354, 0.9385751485824585, 0.9399386644363403, 0.9349846839904785, 0.928803563117981, 0.9371435046195984]
dump(history, '/content/drive/MyDrive/ML3_hist/history_9')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3e1af310>
# Iteration 10: More epochs combined with exponential decay LR.
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
lr_schedule = ExponentialDecay(initial_learning_rate=0.001,
decay_steps=100000,
decay_rate=0.96,
staircase=True)
adam_optimizer = Adam(learning_rate=lr_schedule)
model.compile(optimizer=adam_optimizer,
loss='binary_crossentropy',
metrics=['accuracy'])
history = model.fit(trn_data_gen, epochs=20, validation_data=val_data_gen)
Epoch 1/20 688/688 [==============================] - 140s 201ms/step - loss: 0.4444 - accuracy: 0.7957 - val_loss: 0.3717 - val_accuracy: 0.8396 Epoch 2/20 688/688 [==============================] - 137s 199ms/step - loss: 0.3354 - accuracy: 0.8559 - val_loss: 0.2974 - val_accuracy: 0.8747 Epoch 3/20 688/688 [==============================] - 141s 205ms/step - loss: 0.2689 - accuracy: 0.8891 - val_loss: 0.2404 - val_accuracy: 0.9028 Epoch 4/20 688/688 [==============================] - 138s 201ms/step - loss: 0.2347 - accuracy: 0.9063 - val_loss: 0.2194 - val_accuracy: 0.9148 Epoch 5/20 688/688 [==============================] - 139s 202ms/step - loss: 0.2084 - accuracy: 0.9182 - val_loss: 0.2398 - val_accuracy: 0.9056 Epoch 6/20 688/688 [==============================] - 138s 201ms/step - loss: 0.1901 - accuracy: 0.9266 - val_loss: 0.2115 - val_accuracy: 0.9176 Epoch 7/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1764 - accuracy: 0.9327 - val_loss: 0.1762 - val_accuracy: 0.9345 Epoch 8/20 688/688 [==============================] - 138s 201ms/step - loss: 0.1623 - accuracy: 0.9387 - val_loss: 0.1830 - val_accuracy: 0.9303 Epoch 9/20 688/688 [==============================] - 137s 200ms/step - loss: 0.1523 - accuracy: 0.9430 - val_loss: 0.1780 - val_accuracy: 0.9330 Epoch 10/20 688/688 [==============================] - 137s 199ms/step - loss: 0.1428 - accuracy: 0.9469 - val_loss: 0.1702 - val_accuracy: 0.9376 Epoch 11/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1283 - accuracy: 0.9528 - val_loss: 0.1726 - val_accuracy: 0.9395 Epoch 12/20 688/688 [==============================] - 139s 202ms/step - loss: 0.1240 - accuracy: 0.9539 - val_loss: 0.1886 - val_accuracy: 0.9350 Epoch 13/20 688/688 [==============================] - 137s 199ms/step - loss: 0.1126 - accuracy: 0.9588 - val_loss: 0.1852 - val_accuracy: 0.9319 Epoch 14/20 688/688 [==============================] - 138s 201ms/step - loss: 0.1056 - accuracy: 0.9609 - val_loss: 0.1751 - val_accuracy: 0.9357 Epoch 15/20 688/688 [==============================] - 141s 204ms/step - loss: 0.0962 - accuracy: 0.9646 - val_loss: 0.1748 - val_accuracy: 0.9402 Epoch 16/20 688/688 [==============================] - 139s 202ms/step - loss: 0.0937 - accuracy: 0.9653 - val_loss: 0.1759 - val_accuracy: 0.9401 Epoch 17/20 688/688 [==============================] - 139s 203ms/step - loss: 0.0844 - accuracy: 0.9690 - val_loss: 0.1870 - val_accuracy: 0.9424 Epoch 18/20 688/688 [==============================] - 136s 198ms/step - loss: 0.0787 - accuracy: 0.9708 - val_loss: 0.1830 - val_accuracy: 0.9429 Epoch 19/20 688/688 [==============================] - 139s 202ms/step - loss: 0.0789 - accuracy: 0.9708 - val_loss: 0.1839 - val_accuracy: 0.9407 Epoch 20/20 688/688 [==============================] - 137s 199ms/step - loss: 0.0718 - accuracy: 0.9733 - val_loss: 0.1763 - val_accuracy: 0.9416
history = load('/content/drive/MyDrive/ML3_hist/history_10')
history.history['val_accuracy']
[0.8396318554878235, 0.8747187852859497, 0.9027610421180725, 0.9148051142692566, 0.9055789113044739, 0.9175547957420349, 0.9344847202301025, 0.9302806258201599, 0.9330075979232788, 0.9376434683799744, 0.939484179019928, 0.9349619150161743, 0.9319168329238892, 0.9357345700263977, 0.9402340650558472, 0.9400749802589417, 0.9424383640289307, 0.942892849445343, 0.9407340288162231, 0.9416429996490479]
dump(history, '/content/drive/MyDrive/ML3_hist/history_10')
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label = 'Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.title('Training vs Validation Accuracy by Epoch')
plt.legend(loc='lower right')
<matplotlib.legend.Legend at 0x7c7e3dda7820>
5. Results and Discussion¶
Overall, I'm pretty happy with the results. By changing the model architecture, I was able to achieve small but significant improvements to the prediction scores, without requiring a huge number of parameters. Using exponential decay learning rate scheduler allowed relatively quick convergence without overfitting.
Given more time to improve the model, I might try the RMSProp optimizer and different parameters for the learning rate scheduler.
6. Test predictions and Kaggle submission¶
I retrained a model using the settings in Iteration 10, made predictions for the test data set and submitted to Kaggle for scoring. I achieved a 0.9369 score on the public leaderboard.
# Iteration 10: More epochs combined with exponential decay LR.
# Arch: 2x2 strides, 2 conv layers, 1 fully connected layer with 32 units
# Compile options: Adam, BCE
strides = (1,1)
model = Sequential([
Input((96,96,3), name='Input'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_1'),
Conv2D(32, (3,3), strides=strides, activation='relu', name='Convolution_1_2'),
MaxPooling2D((2,2), name='Max_Pool_1'),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_1', ),
Conv2D(64, (3,3), strides=strides, activation='relu', name='Convolution_2_2', ),
MaxPooling2D((2,2), name='Max_Pool_2'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_1'),
Conv2D(128, (3,3), strides=strides, activation='relu', name='Convolution_3_2'),
MaxPooling2D((2,2), name='Max_Pool_3'),
Flatten(name='Flatten_for_output'),
Dropout(0.5, name='Output_dropout'),
Dense(128, activation='relu', name='Output_dense_1'),
Dense(64, activation='relu', name='Output_dense_2'),
Dense(1, activation='sigmoid', name='Classifier')
], name='Model_3')
lr_schedule = ExponentialDecay(initial_learning_rate=0.001,
decay_steps=100000,
decay_rate=0.96,
staircase=True)
adam_optimizer = Adam(learning_rate=lr_schedule)
model.compile(optimizer=adam_optimizer,
loss='binary_crossentropy',
metrics=['accuracy'])
%%time
model.fit(trn_data_gen, epochs=20, validation_data=val_data_gen)
Epoch 1/20 688/688 [==============================] - 148s 204ms/step - loss: 0.4386 - accuracy: 0.7985 - val_loss: 0.3852 - val_accuracy: 0.8355 Epoch 2/20 688/688 [==============================] - 142s 207ms/step - loss: 0.3190 - accuracy: 0.8646 - val_loss: 0.2800 - val_accuracy: 0.8856 Epoch 3/20 688/688 [==============================] - 135s 196ms/step - loss: 0.2591 - accuracy: 0.8939 - val_loss: 0.2326 - val_accuracy: 0.9097 Epoch 4/20 688/688 [==============================] - 142s 206ms/step - loss: 0.2270 - accuracy: 0.9103 - val_loss: 0.2084 - val_accuracy: 0.9181 Epoch 5/20 688/688 [==============================] - 138s 200ms/step - loss: 0.2059 - accuracy: 0.9196 - val_loss: 0.2064 - val_accuracy: 0.9201 Epoch 6/20 688/688 [==============================] - 136s 197ms/step - loss: 0.1881 - accuracy: 0.9281 - val_loss: 0.1965 - val_accuracy: 0.9233 Epoch 7/20 688/688 [==============================] - 132s 191ms/step - loss: 0.1760 - accuracy: 0.9325 - val_loss: 0.1823 - val_accuracy: 0.9300 Epoch 8/20 688/688 [==============================] - 133s 193ms/step - loss: 0.1635 - accuracy: 0.9382 - val_loss: 0.1838 - val_accuracy: 0.9288 Epoch 9/20 688/688 [==============================] - 141s 204ms/step - loss: 0.1506 - accuracy: 0.9434 - val_loss: 0.2026 - val_accuracy: 0.9220 Epoch 10/20 688/688 [==============================] - 132s 191ms/step - loss: 0.1416 - accuracy: 0.9470 - val_loss: 0.1827 - val_accuracy: 0.9335 Epoch 11/20 688/688 [==============================] - 136s 197ms/step - loss: 0.1323 - accuracy: 0.9508 - val_loss: 0.1843 - val_accuracy: 0.9287 Epoch 12/20 688/688 [==============================] - 138s 201ms/step - loss: 0.1200 - accuracy: 0.9555 - val_loss: 0.1666 - val_accuracy: 0.9408 Epoch 13/20 688/688 [==============================] - 137s 199ms/step - loss: 0.1131 - accuracy: 0.9585 - val_loss: 0.1775 - val_accuracy: 0.9345 Epoch 14/20 688/688 [==============================] - 130s 189ms/step - loss: 0.1055 - accuracy: 0.9614 - val_loss: 0.1694 - val_accuracy: 0.9414 Epoch 15/20 688/688 [==============================] - 136s 197ms/step - loss: 0.0982 - accuracy: 0.9635 - val_loss: 0.1605 - val_accuracy: 0.9439 Epoch 16/20 688/688 [==============================] - 140s 203ms/step - loss: 0.0936 - accuracy: 0.9654 - val_loss: 0.1823 - val_accuracy: 0.9343 Epoch 17/20 688/688 [==============================] - 134s 195ms/step - loss: 0.0872 - accuracy: 0.9678 - val_loss: 0.1717 - val_accuracy: 0.9431 Epoch 18/20 688/688 [==============================] - 134s 195ms/step - loss: 0.0827 - accuracy: 0.9699 - val_loss: 0.1656 - val_accuracy: 0.9437 Epoch 19/20 688/688 [==============================] - 129s 188ms/step - loss: 0.0773 - accuracy: 0.9719 - val_loss: 0.1793 - val_accuracy: 0.9378 Epoch 20/20 688/688 [==============================] - 133s 193ms/step - loss: 0.0727 - accuracy: 0.9734 - val_loss: 0.1786 - val_accuracy: 0.9429 CPU times: user 50min 59s, sys: 5min 22s, total: 56min 21s Wall time: 45min 25s
<keras.src.callbacks.History at 0x7a0e93633940>
preds = model.predict(tst_data_gen)
dump(preds, '/content/drive/MyDrive/kaggle/ml3_preds')
preds = load('/content/drive/MyDrive/kaggle/ml3_preds')
225/225 [==============================] - 35s 155ms/step
['/content/drive/MyDrive/kaggle/ml3_preds']
test_df['id'] = test_df['file_name'].apply(lambda x: x.split('.')[0])
test_df['label'] = preds.reshape(-1)
submission = test_df[['id', 'label']]
submission.to_csv('/content/drive/MyDrive/kaggle/submission.csv', index=False)
submission
| id | label | |
|---|---|---|
| 0 | 1c9de83a0cb3e8918884719a158fc4cad3f9d1af | 0.014797 |
| 1 | b383c963d3236b55a941f9a9503d198ff5491116 | 0.685709 |
| 2 | 35b99f7e8df4882ade0ff57aa0f2ae511911b371 | 0.970075 |
| 3 | f78b7600773617b56ec78af1cea5e827422a2a80 | 0.982597 |
| 4 | 9c2041bad259eecdf62dcd24c9a75c14a37b6363 | 0.005376 |
| ... | ... | ... |
| 57453 | 680382b1e26f22d8b36f5b809b04d6920bc78607 | 0.993812 |
| 57454 | 9ee5a93349fb649335787585ed5f77d6a8185054 | 0.017764 |
| 57455 | a3a4e7a165fdb629fa995317e3393372f19ad267 | 0.290335 |
| 57456 | 35a2ab6b18fd10d3125144146711e62c79edc52c | 0.996142 |
| 57457 | f2614e68667e9c980b5f9ee61b09d19c96aed067 | 0.977098 |
57458 rows × 2 columns